[DEV-1434] Wire eval command to judge-facts cross-session memory by alexeyzimarev · Pull Request #12 · kurrent-io/kapacitor

alexeyzimarev · 2026-04-13T11:54:27Z

Summary

Third leg of DEV-1434. The CLI now:

Fetches retained judge facts per category at eval startup (GET /api/judge-facts?category=<cat> × 4).
Formats each category's facts as a bulleted block and injects them into the matching judge's prompt under a new "Known patterns" section.
Parses an optional `retain_fact` from each judge's response (independent of the verdict parser so it doesn't regress when verdict parsing fails).
POSTs non-null / non-empty facts to `/api/judge-facts` so future runs see them.

Prompt template grew a "Known patterns" section and a `retain_fact` field in the response schema, with strict guidance on when to emit one — only generalizable patterns ("User force-pushes with uncommitted work"), never single observations ("Ran rm -rf /tmp/cache"), to stop the retained-fact list from degenerating into noise.

Design notes

Fact fetching per-category is serial but cheap (4 small-stream reads). Any per-category fetch failure is logged and skipped so a single bad category doesn't kill the whole run.
Empty fact list renders `(no patterns retained for this category yet)` so the prompt still reads naturally on a fresh system.
`ExtractRetainFact` tolerates: code fences, missing field, explicit null, empty/whitespace string, non-string value, malformed JSON — in all cases returning null.
Depends on server-side endpoints in kurrent-io/Kurrent.Capacitor#475.

Also cleaned up

Four pre-existing `TUnitAssertions0015` warnings in `SetupCommandTests` (`.IsEqualTo(true)` → `.IsTrue()`).

Test plan

`dotnet build src/kapacitor/kapacitor.csproj` — clean, 0 warnings
`dotnet publish -c Release` — zero IL3050/IL2026 warnings (AOT-clean)
`dotnet run --project test/kapacitor.Tests.Unit --no-build` — 205/205 pass
- 10 new EvalCommandTests: `FormatKnownPatterns` (empty + populated), `ExtractRetainFact` (present, fenced, absent, null, empty, whitespace, non-string, malformed JSON)
- Existing `BuildQuestionPrompt` test updated for new `knownPatterns` parameter
CI
End-to-end once #475 lands

🤖 Generated with Claude Code

Third leg of DEV-1434: the CLI now fetches retained judge facts per category at eval startup, injects them into each judge prompt as "known patterns", parses an optional retain_fact from each judge response, and POSTs novel facts back to the server for future runs. - Models.cs: JudgeFactPayload (write), JudgeFact (read), registered in the source-gen JSON context as List<JudgeFact> + JudgeFactPayload. - EvalCommand: FetchAllJudgeFactsAsync loads all four categories at startup and stores them per-category for per-question injection; a fetch failure for any single category is logged and skipped (non-fatal). PostJudgeFact runs after each judge when retain_fact is non-null and non-empty. - ExtractRetainFact is a standalone parser that tolerates code fences and rejects null/undefined/empty/whitespace/non-string values — independent of ParseVerdict so retained-fact plumbing doesn't regress if verdict parsing ever fails. - FormatKnownPatterns renders the facts as a bulleted block, with an explicit "(none yet)" marker when empty so the prompt still reads naturally on a fresh system. - Prompt template grew a "Known patterns" section and a retain_fact field in the response schema, with strict guidance on when to emit one (only generalizable patterns, never single observations). 10 new EvalCommandTests covering FormatKnownPatterns (empty + populated) and ExtractRetainFact (present, fenced, absent, null, empty, whitespace, non-string, malformed JSON). Full suite 205/205, AOT publish clean. Depends on the server-side endpoints in #475 (kurrent-io/Kurrent.Capacitor). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace .IsEqualTo(true) on nullable bool JSON reads with .IsTrue() (per TUnit analyzer suggestion). Coerce the nullable via `?? false` so the assertion documents "this must be true" rather than "this must equal true, or null is fine" — both match the original intent since the test setup writes enabledPlugins as `true`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

linear · 2026-04-13T11:54:30Z

DEV-1434 Cross-eval memory via KurrentDB streams

qodo-code-review · 2026-04-13T11:54:48Z

Review Summary by Qodo

Wire eval command to judge-facts cross-session memory

✨ Enhancement

Walkthroughs

Description

• Implement cross-session judge-facts memory for eval command
  - Fetch retained facts per category at eval startup
  - Inject facts into judge prompts as "known patterns" section
  - Parse optional retain_fact from judge responses
  - POST novel facts back to server for future runs
• Add FormatKnownPatterns and ExtractRetainFact helper methods
  - Format facts as bulleted list with empty-state marker
  - Tolerant parsing of retain_fact field (handles code fences, null, empty, malformed JSON)
• Extend prompt template with "Known patterns" section and retain_fact response field
• Add 10 new unit tests covering fact formatting and extraction edge cases
• Fix 4 pre-existing TUnitAssertions0015 warnings in SetupCommandTests

Diagram

flowchart LR
  A["Eval startup"] -->|FetchAllJudgeFactsAsync| B["Load facts per category"]
  B -->|FormatKnownPatterns| C["Inject into judge prompt"]
  D["Judge response"] -->|ExtractRetainFact| E["Parse retain_fact field"]
  E -->|PostJudgeFactAsync| F["POST to /api/judge-facts"]
  F -->|Future evals| A

File Changes

1. src/kapacitor/Commands/EvalCommand.cs ✨ Enhancement +121/-9

Implement judge-facts fetching, formatting, extraction, and posting

• Added FetchAllJudgeFactsAsync to load retained facts per category at eval startup with
 per-category error handling
• Added FormatKnownPatterns to render facts as bulleted list with "(no patterns retained yet)" for
 empty state
• Added ExtractRetainFact to parse optional retain_fact field from judge responses, tolerating
 code fences and malformed JSON
• Added PostJudgeFactAsync to persist novel facts back to server after each judge invocation
• Updated BuildQuestionPrompt signature to accept knownPatterns parameter and inject into
 template
• Added Categories constant array for the four judge categories

src/kapacitor/Commands/EvalCommand.cs

2. src/kapacitor/Models.cs ✨ Enhancement +38/-0

Add JudgeFact and JudgeFactPayload models

• Added JudgeFactPayload record for writing facts to server (category, fact, source session/run
 IDs)
• Added JudgeFact record for reading facts from server (includes retained_at timestamp)
• Registered both types in KapacitorJsonContext for source-gen JSON serialization

src/kapacitor/Models.cs

3. test/kapacitor.Tests.Unit/EvalCommandTests.cs 🧪 Tests +100/-2

Add tests for fact formatting and extraction

• Updated BuildQuestionPrompt_substitutes_all_placeholders test to include {KNOWN_PATTERNS}
 placeholder
• Added FormatKnownPatterns_returns_explicit_empty_marker_when_no_facts test
• Added FormatKnownPatterns_renders_bulleted_list test with multiple facts
• Added 8 ExtractRetainFact tests covering: present value, code-fenced response, absent field,
 explicit null, empty string, whitespace, non-string value, malformed JSON

test/kapacitor.Tests.Unit/EvalCommandTests.cs

View more (2)

4. test/kapacitor.Tests.Unit/SetupCommandTests.cs 🐞 Bug fix +8/-8

Clear TUnitAssertions0015 warnings in SetupCommandTests

• Replaced .IsEqualTo(true) with .IsTrue() on 4 nullable bool assertions in
 InstallPlugin_CreatesNewSettingsFile, InstallPlugin_PreservesExistingSettings, and
 InstallPlugin_MalformedJson_StartsFromScratch tests
• Added null-coalescing operator (?? false) to handle nullable bool JSON reads

test/kapacitor.Tests.Unit/SetupCommandTests.cs

5. src/kapacitor/Resources/prompt-eval-question.txt 📝 Documentation +21/-1

Extend prompt template with known patterns and retain_fact field

• Added "Known patterns" section explaining prior context from past evaluations
• Added retain_fact field to response JSON schema with null as default
• Added "When to emit retain_fact" guidance section with examples of generalizable patterns vs.
 single observations
• Clarified that retained facts should not be noise and judges should emit null if nothing is worth
 generalizing

src/kapacitor/Resources/prompt-eval-question.txt

qodo-code-review · 2026-04-13T11:54:50Z

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0) 🖥 UI issues (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (1) ☼ Reliability (2) ⚙ Maintainability (1)

1. retain_fact skipped on failure 🐞 ≡

Description

HandleEval only persists retain_fact after ParseVerdict succeeds, so any judge response that
includes a usable retain_fact but fails verdict parsing is silently not retained. This contradicts
ExtractRetainFact’s intent that retention plumbing “doesn't depend on verdict parsing succeeding.”

Code

src/kapacitor/Commands/EvalCommand.cs[R140-143]

+            // If the judge emitted a retain_fact, persist it for future evals.
+            if (ExtractRetainFact(result.Result) is { } retainedFact) {
+                await PostJudgeFactAsync(httpClient, baseUrl, q.Category, retainedFact, context.SessionId, evalRunId);
+            }

Evidence

The eval loop continues early when ParseVerdict returns null, so
ExtractRetainFact/PostJudgeFactAsync is never executed for that response. This is at odds with the
ExtractRetainFact doc comment stating retention shouldn't depend on verdict parsing success.

src/kapacitor/Commands/EvalCommand.cs[129-144]
src/kapacitor/Commands/EvalCommand.cs[216-245]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`retain_fact` persistence is currently gated on `ParseVerdict` succeeding. If the judge returns a response where `retain_fact` is present/valid but the verdict JSON is unparseable or fails validation (e.g., missing/out-of-range score), the loop `continue`s before calling `ExtractRetainFact`, so the fact is never retained.

### Issue Context
`ExtractRetainFact` is documented as independent of `ParseVerdict` so retention shouldn't depend on verdict parsing succeeding.

### Fix
Refactor the per-question loop to extract `retain_fact` regardless of verdict parse outcome, and only gate `verdicts.Add(...)` on `ParseVerdict`.

A typical structure:
- `var retainedFact = ExtractRetainFact(result.Result);`
- `var verdict = ParseVerdict(...);`
- if verdict != null -> add
- if retainedFact != null -> post

### Fix Focus Areas
- src/kapacitor/Commands/EvalCommand.cs[105-144]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Judge-facts JSON aborts eval 🐞 ☼

Description

FetchAllJudgeFactsAsync only catches HttpRequestException; malformed/non-matching JSON from
/api/judge-facts can throw JsonException from JsonSerializer.Deserialize and crash eval startup.
This violates the comment that fact-fetch failures don’t abort the run.

Code

src/kapacitor/Commands/EvalCommand.cs[R251-265]

+            try {
+                using var resp = await httpClient.GetWithRetryAsync($"{baseUrl}/api/judge-facts?category={category}");
+                if (!resp.IsSuccessStatusCode) {
+                    Log($"Failed to fetch judge facts for {category}: HTTP {(int)resp.StatusCode}");
+
+                    continue;
+                }
+
+                var json = await resp.Content.ReadAsStringAsync();
+                var list = JsonSerializer.Deserialize(json, KapacitorJsonContext.Default.ListJudgeFact) ?? [];
+                result[category] = list;
+                Log($"Loaded {list.Count} retained facts for category {category}");
+            } catch (HttpRequestException ex) {
+                Log($"Could not load judge facts for {category}: {ex.Message}");
+            }

Evidence
FetchAllJudgeFactsAsync deserializes response JSON without catching JsonException, and HandleEval
calls it without a surrounding try/catch—so a JSON parse failure will propagate and abort the entire
eval. Elsewhere in the same file, JSON parsing failures are expected and handled via `catch
(JsonException)` (ParseVerdict).
src/kapacitor/Commands/EvalCommand.cs[93-99]
src/kapacitor/Commands/EvalCommand.cs[247-269]
src/kapacitor/Commands/EvalCommand.cs[312-320]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`FetchAllJudgeFactsAsync` can throw `JsonException` when `/api/judge-facts` returns malformed JSON (or an unexpected shape). The method currently only catches `HttpRequestException`, so this exception will bubble up and abort `HandleEval` before any questions run.

### Issue Context
The comment in `HandleEval` says failures fetching retained facts should not abort the run.

### Fix
In `FetchAllJudgeFactsAsync`, extend error handling to catch `JsonException` (and optionally `NotSupportedException`) around `JsonSerializer.Deserialize`, log an informative message, and `continue` to the next category.

### Fix Focus Areas
- src/kapacitor/Commands/EvalCommand.cs[93-99]
- src/kapacitor/Commands/EvalCommand.cs[247-269]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ExtractRetainFact non-object crash 🐞 ☼

Description

ExtractRetainFact can throw InvalidOperationException when the response is valid JSON but not an
object (e.g., a JSON string/array), because it calls RootElement.TryGetProperty unconditionally and
only catches JsonException. This breaks the function’s tolerance guarantees and becomes more likely
if retain_fact extraction is moved earlier.

Code

src/kapacitor/Commands/EvalCommand.cs[R226-244]

+        try {
+            using var doc = JsonDocument.Parse(json);
+            if (!doc.RootElement.TryGetProperty("retain_fact", out var prop)) {
+                return null;
+            }
+
+            if (prop.ValueKind is JsonValueKind.Null or JsonValueKind.Undefined) {
+                return null;
+            }
+
+            if (prop.ValueKind != JsonValueKind.String) {
+                return null;
+            }
+
+            var text = prop.GetString()?.Trim();
+            return string.IsNullOrEmpty(text) ? null : text;
+        } catch (JsonException) {
+            return null;
+        }

Evidence

ExtractRetainFact unconditionally calls doc.RootElement.TryGetProperty(...) and does not check
ValueKind nor catch exceptions other than JsonException. TryGetProperty is only valid on object
elements, so valid non-object JSON can throw and crash the eval if this function is invoked on such
output.

src/kapacitor/Commands/EvalCommand.cs[223-245]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ExtractRetainFact` assumes the parsed JSON root is an object and calls `TryGetProperty`. If the model returns valid JSON that isn't an object (e.g., `"oops"`, `[1,2]`, `true`), `TryGetProperty` can throw `InvalidOperationException`, which is not caught.

### Issue Context
The function is documented as tolerant; additionally, fixing the gating issue may cause `ExtractRetainFact` to run on more malformed/unexpected outputs.

### Fix
Add a guard:
- After parsing, if `doc.RootElement.ValueKind != JsonValueKind.Object`, return null.
Optionally broaden the catch to include `InvalidOperationException` as a belt-and-suspenders fallback.
Add a unit test for a valid-but-non-object JSON response (e.g., `"hi"` or `[1]`) returning null.

### Fix Focus Areas
- src/kapacitor/Commands/EvalCommand.cs[223-245]
- test/kapacitor.Tests.Unit/EvalCommandTests.cs[240-311]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Category list duplication risk 🐞 ⚙

Description

EvalCommand duplicates categories in both Questions[] and Categories[], so adding a new question
category later can silently skip judge-fact fetching/retention for that category. This creates a
latent drift bug.

Code

src/kapacitor/Commands/EvalCommand.cs[294]
+    static readonly string[] Categories = ["safety", "plan_adherence", "quality", "efficiency"];

Evidence
Questions define the categories that are actually evaluated, but FetchAllJudgeFactsAsync iterates
over a separate Categories array; if they diverge, known-pattern injection and retention will be
incomplete for the new category.
src/kapacitor/Commands/EvalCommand.cs[13-34]
src/kapacitor/Commands/EvalCommand.cs[247-266]
src/kapacitor/Commands/EvalCommand.cs[294-294]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
There are two sources of truth for categories (`Questions` and `Categories`). This can drift over time and silently break judge-facts fetching/retention for new categories.

### Fix
Derive `Categories` from `Questions` (e.g., distinct categories), or remove `Categories` entirely and iterate distinct categories from `Questions` within `FetchAllJudgeFactsAsync`.

### Fix Focus Areas
- src/kapacitor/Commands/EvalCommand.cs[13-34]
- src/kapacitor/Commands/EvalCommand.cs[247-266]
- src/kapacitor/Commands/EvalCommand.cs[294-294]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-13T11:59:26Z

+            // If the judge emitted a retain_fact, persist it for future evals.
+            if (ExtractRetainFact(result.Result) is { } retainedFact) {
+                await PostJudgeFactAsync(httpClient, baseUrl, q.Category, retainedFact, context.SessionId, evalRunId);
+            }


1. Retain_fact skipped on failure 🐞 Bug ≡ Correctness

HandleEval only persists retain_fact after ParseVerdict succeeds, so any judge response that includes a usable retain_fact but fails verdict parsing is silently not retained. This contradicts ExtractRetainFact’s intent that retention plumbing “doesn't depend on verdict parsing succeeding.”

Agent Prompt

### Issue description `retain_fact` persistence is currently gated on `ParseVerdict` succeeding. If the judge returns a response where `retain_fact` is present/valid but the verdict JSON is unparseable or fails validation (e.g., missing/out-of-range score), the loop `continue`s before calling `ExtractRetainFact`, so the fact is never retained. ### Issue Context `ExtractRetainFact` is documented as independent of `ParseVerdict` so retention shouldn't depend on verdict parsing succeeding. ### Fix Refactor the per-question loop to extract `retain_fact` regardless of verdict parse outcome, and only gate `verdicts.Add(...)` on `ParseVerdict`. A typical structure: - `var retainedFact = ExtractRetainFact(result.Result);` - `var verdict = ParseVerdict(...);` - if verdict != null -> add - if retainedFact != null -> post ### Fix Focus Areas - src/kapacitor/Commands/EvalCommand.cs[105-144]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-04-13T11:59:26Z

+            try {
+                using var resp = await httpClient.GetWithRetryAsync($"{baseUrl}/api/judge-facts?category={category}");
+                if (!resp.IsSuccessStatusCode) {
+                    Log($"Failed to fetch judge facts for {category}: HTTP {(int)resp.StatusCode}");
+
+                    continue;
+                }
+
+                var json = await resp.Content.ReadAsStringAsync();
+                var list = JsonSerializer.Deserialize(json, KapacitorJsonContext.Default.ListJudgeFact) ?? [];
+                result[category] = list;
+                Log($"Loaded {list.Count} retained facts for category {category}");
+            } catch (HttpRequestException ex) {
+                Log($"Could not load judge facts for {category}: {ex.Message}");
+            }


2. Judge-facts json aborts eval 🐞 Bug ☼ Reliability

FetchAllJudgeFactsAsync only catches HttpRequestException; malformed/non-matching JSON from /api/judge-facts can throw JsonException from JsonSerializer.Deserialize and crash eval startup. This violates the comment that fact-fetch failures don’t abort the run.

Agent Prompt

### Issue description `FetchAllJudgeFactsAsync` can throw `JsonException` when `/api/judge-facts` returns malformed JSON (or an unexpected shape). The method currently only catches `HttpRequestException`, so this exception will bubble up and abort `HandleEval` before any questions run. ### Issue Context The comment in `HandleEval` says failures fetching retained facts should not abort the run. ### Fix In `FetchAllJudgeFactsAsync`, extend error handling to catch `JsonException` (and optionally `NotSupportedException`) around `JsonSerializer.Deserialize`, log an informative message, and `continue` to the next category. ### Fix Focus Areas - src/kapacitor/Commands/EvalCommand.cs[93-99] - src/kapacitor/Commands/EvalCommand.cs[247-269]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

alexeyzimarev and others added 2 commits April 13, 2026 13:53

qodo-code-review bot reviewed Apr 13, 2026

View reviewed changes

alexeyzimarev merged commit d700725 into main Apr 13, 2026
3 checks passed

alexeyzimarev deleted the alexeyzimarev/dev-1434b-cli-judge-facts branch April 13, 2026 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV-1434] Wire eval command to judge-facts cross-session memory#12

[DEV-1434] Wire eval command to judge-facts cross-session memory#12
alexeyzimarev merged 2 commits intomainfrom
alexeyzimarev/dev-1434b-cli-judge-facts

alexeyzimarev commented Apr 13, 2026

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot Apr 13, 2026

Uh oh!

qodo-code-review bot Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexeyzimarev commented Apr 13, 2026

Summary

Design notes

Also cleaned up

Test plan

Uh oh!

linear bot commented Apr 13, 2026

Uh oh!

qodo-code-review bot commented Apr 13, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review bot commented Apr 13, 2026 •

edited

Loading